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(57) Abstract 

The present invention provides a novel ap- 
paratus for computing products in Galois fields 
GF(p«*) with emphasis on the case p = 2. The 
elements of the field are represented in polynom- 
ial basis and no basis conversion is required. The 
apparatus consists of two distinct subunits. The 
first subunit simultaneously produces the first m 
a-muhiples of one of the two elements to be multi- 
plied. TTie second subunit simultaneously produces 
the m inner products of the second element and the 
m vectors consisting of suitable components of the 
above mentioned a-multiples. Both subunits are 
capable of operating over any Galois field GF(p">) 
where m is an integer in the range [2, M]. Consc;- 
quently, the apparatus is programmable for op- 
eration over any of the above mentioned Galois 
fields. 
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1 . 

Universal galois field multiplier 

The invention is concerned with the multiplication of two arbitrary 
elements belonging to a Galois field, especially an apparatus for 
performing such multiplication. * 

Galois fields are finite fields consisting of elements, where p is a 
5 prime number and m a positive integer. The field GF(2'^) is of particular 
importance in practice because its elements can be represented by binaiy 
pol3niomials of degree at most m-l in a particular primitive element. This 
primitive element is a root of the irreducible priihitive polynomial of degree 
m that generates the Galois field. 

1 0 Galois fields are of fundamental importance in the construction, 

encoding and decoding of several classes of powerful error-control codes 
(here abbreviated ECC) like Bose-Chaudury-Hocqenhem codes (called BCH 
codes), Reed-Solomon codes (called RS codes) and Goppa codes. The reader 
is referred to F.J. MacWilliams, N.J. A. Sloane "The Theory of Error- 
15 Correcting Codes", Amsterdam: North-Holland 1977, for details on the 
theory of ECC and an introduction to the theory of finite fields. The book by 
R.E. Blahut, "Theory and practice of Error Control Codes", Cambridge,^ 
MArAddison- Wesley ,1984, gives another treatment of the same theories^% 
with emphasis on the practical aspects. 

2 0 The main parameters of an ECC are the block length /i, the number of 

information symbols k (also called the dimension) and the mirumum> 
(Hamming) distance d between two any codewords of the code. A code with 
minimum distance d is capable of correcting t errors and s erasures as 
long as 2f + 5 < d-l. ECCs are very usefiil in practice for improving the 

2 5 reliability of a noisy communication channel. However, different 

applications require different codes with different parameters n^kyd. 
These parameters are all directly or indirectly related to the number (=2^") 
of elements of the Galois field GF(2'^). For example the maximum block 
length n of an RS code is 2'^ + 1. This means that, if we are constrained to 

3 0 use one single Galois field we are also limited in our selection of ECC. 

Building a dedicated hardware for every code of practical interest is 
obviously unreasonable. Sometimes dedicated hardware can though be 
motivated by standardization and/or by extreme speed requirements. In 
many other situations a flexible, programmable device capable of 
3 5 implementing different code$ over different Galois fields would be the most 
appropriate choice. The most crucial and important single unit in a device 
capable of providing the aforementioned flexibility, is a fast universal 
Galois field multiplier (here abbreviated I7GM) capable of operating over a 
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number of different Galois fields. Actually, multiplication is by far the most 
common operation occurring in the encoding/decoding procedures of, for 
example, BCH and RS codes. Successive multiplications can also be used to 
compute the inverse of a field elemelit. Inversion is required in the 
5 decoding of, for example, BCH and RS codes. 

A prior art UGM has resulted in a cellular-array mtdtiplier which is too 
slow to be really practical. The poor performance of the prior art UGM is 
due to a worst signal path of about 6m levels of logic when the UGM is 
operated over GF(2'"). Details on the prior art UGM are found in B.A. Laws, 
1 0 O.K. Rushforth, "A Cellular-Array Multiplier for GF(2'^)", IEEE Trans. 
Comput., Vol. C-20, pp. 1573-1578, December 1971. 

The principal object of the invention is to provide a novel apparatus for 
computing products of elements belonging to a Galois field GFip"^ ) with 
emphasis on the case p = 2, The new apparatus has fewer components and 

1 5 higher speed than previous art apparatus. 

It is a feature of this invention to be programmable for operation over any 
Galois field GF(p"* ) with 2<m < M where Af is an arbitrary positive integer 
greater than one. 

The invention, as well as the embodiments thereof, is defined in the 

2 0 appended claims. 

BRIEF DESCRIPTIQN OF THE DRAWINGS 

FIG. 1 is a block diagram of apparatus according to a preferred 
organization. 

FIG. 2 is a more detailed block diagram of a sub-unit of apparatus used 

2 5 to compute a-A over different fields of characteristic two. 

FIG. 3 is yet a more detailed block diagram of a sub-unit of apparatus 
used to compute the inner product of two binary vectors. 

FIG. 4 is an example of appEuratus for the fields GFC2'^), 2 < to < 4. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

3 0 The discussion of apparatus requires a review of some basic properties of 

a Galois field. A Galois field GFCp'^) is an algebraic finite field consisting of 
p"^ elements, where p is a prime and m a positive integer. Among the field 
elements are included the null element, 0, and the mait element, 1. Upon 
the elements in the field are defined the operations of addition, subtraction, 
3 5 multiplication and division. Addition, subtraction and multiplication are 
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associative and cornmutative and multiplication is distributive with respect 
to addition and subtraction. Further, any of the four aforementioned 
operations residts always in an element of the field. 

The present invention is primaiily concerned with, but not limited to 
5 fields of characteristic two (i.e. p = 2) which are denoted by GF(2'"). The 
smallest of these fields (m=l) consists actually only of a null element 0 and 
a unit element 1 and it is called the binary field GF(2). Addition and 
multiplication in GF(2) are performed modulo 2, i.e. 0+0=1+1=0, 0+1=1+0=1, 
0-0=0-l =1-0=0, 1-1=1 and -1=1. Addition is thus the same as exclusive-or 
1 0 (XOR) whereas multiplication is the same as logical AND. 

In GF(2"*), m >^ 1, each element can be represented by a pol3naomial of 
degree m-1 or less with binary coefficients. Each element is a residue 
modulo an irreducible pol3niomial of degree m over GF(Z), and all 
arithmetic operations on the coefficients are performed modulo 2, 

1 5 Alternatively, the field GF(2'^) can be seen as a linear vector space over 

GF(2) of diinension m (in which case it should be denoted GF(2)'^). 

For each integer m there exists only one finite field with 2"* elements 
(this is true in general for fields of any characteristic). In general, 
however, there exist several different representations of the elements of a 

2 0 finite field. The particular representation is given by the particular^ 

irreducible polynomiEd chosen to generate the finite field. 

Representing an element A as a polynomial Cq + a^x + ... + 0^,2^"^'^ + 
a^.^x'^"^ corresponds to choosing the set of field elements {1, a, a^'^f 
a""*J as a basis ofGFi2"^). Every element can thus be expressed as a linear 

2 5 combination of the basis elements. In particxilar, the elements a', / = 0, 

1,..., 1 are represented in this basis by the polynomials x', i=0, 1, m— 1 
and the expression + a^x + ... + a^^rpc^'^ + a^.^x^^'^ is equivalent to Cq + 
a^a ^ ... + ^m-i^ • t3Te of basis discussed above is 

naturally called the polynomial basis. 

3 0 In the following we call P{x) the irreducible polynomial generating the 

field and which has the field element a as a root, i.e. P{a) = 0. A(x) is the 
polynomial associated with the field element A, Bix) the polynomial 
associated with the field element B and C{x) the polynomial associated with 
the product of A and B, Then the product is given by the following 
3 5 expression 

C(x) = AixySix) mod Pix) = 



= [6oA(x) + ft^xACx) + ... + 6^.ix'""^A(x)] mod P(x) = 
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= IboAix) mod P(x)] + [6 ^ xA(x) mod P(x)] + . . . + 
We define now the polynomials Z^ Xx) as follows: 



(1) 



m-l . 

(x) = ^z. jx' = x'AOc) mod P(x) i = 0, 1. 
where 2r.^ e GF(2). then 

And in matrix notation 



,m-l 



(2) 



(3) 



C = 



/Co S 



'm-i,o 



(4) 



where Z is the m by m binary matrix in equation (4). We see that the 
1 0 product C can be obtained by computing the m inner products j-B^j = 
0,1..,, m-l, where Z^ j denotes tiie j:th row of Z, First, tliough, the entries of 
Z have to be generated and this can be done as follows. We generate the m 
columns of Z simultaneously by cascading m-l identical cells where each 
cell implements the operation xA{x) mod P(x) (the first column Zq . is the 

1 5 element A itself, see equation (2)). We call such a cell the a-cell and the 

cascaded structure the a-array. ' 

The polynomial Fix) used to generate the field is of the form + ^'^'Vm-i 
+ , . . + xp^ + 1 (the first and last coefficient must necessarily be ones if P(jc) is 
to be irreducible). Then the expression xA{x) mod P{x) can be written as 

2 0 follows: 

jc-A(x)= x^a^j^ + x^'^a^^2'^ . . . + x^a^ + oca^ = 



m-l 



(5) 



i=l 



In equation (5) we have utilized the fact that oT = + (xp^ + 1 

25 (or equivaiently x^ -^^'^Pm-i^ + ^Pi + D- Equation (5) describes the 
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function of the a-cell for fixed m: for each 0, i = 1, 2, m-1, one sum 
a^.i + ^i-i to be computed whereas the coefficient of isA'^ most 
significant coefficient a^.^. We call a^.^ the feedback (PB) signal. 

Having described the mathematical preliminaries, a , preferred 
5 embodiment of the novel UGM will now follow, 

A, Hardware 

Fig. 1 shows the general structure of the novel UGM. The notation is 
consistent with the previous section. Urdt 1 is the a-array that generates 
the entries of the matrix Z as defined in equation (4). Unit 2 computes the m 
1 0 inner products Cj = Z^ yB.J =0,1..., m-1 and is here called the IP network. 
The IP network consists in turn of m identical cells, where each cell, here 
called the IP-cell, computes one inner product. The UGM requires the 
input field elements to have zeros in the unused high- order positions, i.e. 
a/= fe^= 0, z > m-l. 

1 5 Fig. 2 shows a preferred implementation of the a-cell 11 for performing 

the operation xA(x) mod P(x) (or, eqmvalently, oA). The a-cell can be 
programmed to operate over any of the fields GF(2'^), 2 <m <Mhy means of^ 

the binary vectors P = (Pi,P2»^3> •••♦^ikf-i^ ^^'^^ --^^Jif-i^ show^iV 

in Fig. 2. ^ : 

2 0 Suppose we want to program the UGM for operation over GF(2'^) where. 

m is a particular value in the usable range. Then the components of the 
vector S are set as follows: 



^={o i 



The vector S determines the feedback signal FB of Fig. 2. The first m-1 

2 5 components of the vector P are the m-l middle coefficients of the irreducible 

polsrnomial P(x) chosen to generate the field. The remaining coefficients 
through P]^.i are, for example, set to zero. 

We see in Fig. 2 that the a-cell has a regular bit-slice structure 
consisting of m-1 identical subcells (tinit 111 in Fig. 2). In each subcell 

3 0 there is one binary adder (XOR), one switch SW and one multiplexer MX. 

The switch SW in subcell #i is controlled by the signal in the following 
way: SW is closed if = 1 , SW is open if = 0. The multiplexer MX is 
controlled by the signal Pi in the following way: if = 1 then MX passes the 
signal coming fi'om the binary adder (= a^.^ + a^.i), if P£ = 0 then MX passes 
3 5 the- other input (= a^-.i ). 
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Fig. 3 shows a preferred implementation of tiie IP-cell 21 based on two- 
input gates. M AND gates- and M-1 XOR gates are required. The 
multiplexer MX appended to the output of the IP-cell is required to zero the 
product coefficients for i> m -1 since fhese are not xised. In this case the 
signal is the z:th component of a vector V = (vq, y Vj^.i): that could be 
set as follows 



JO i< 



<m-l 
m-1 



(7) 



The multiplexer MX would then zero the output if = 1 . If y ^ = 0 the 
output of the XOR-tree is selected. 

1 0 Fig. 4 shows the complete UGM for the case M = 4 together with a table of 
values for the vectors S and Vfor 2 < m < 4. Notice that m > 2 implies that 
the first two components Sq and Sj of S are always zero and need not be 
generated (the multiplexer could be skipped in those IP-cells); The field 
generator PCx) is not indicated but can be chosen as follows: P(x) - + x + i 

15 form = 4,P(a:)=x^+x + l for/Ti = 3 andP(x) = x^ + x + 1 form = 2. 
The extension to a new value of M is straightforward. 

Operating the UGM for m <M means that only a psirt of a-array is used. 
This fact can be easily illustrated by help of equation (4). First we define the 
vectors Gl, Cu, Bl and Su as follows 



20 



where Hie superscript T indicates transposition. Then we have 



b 

















J 





(8) 



25 



where Z^,Z^ and Zg are submatrices of Z defined according to the 
subdivision of Z indicated in equation (8). The product of interest for us is 
Zj^-Bl and we want it to appear on the lines of Cl- To have this product 
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1 . 
correctly computed we must ensure that the product Z^-B^j is always zero. 

But this is the case since is required to be zero. What remains to take 

care of is the product ^2'-^L since this is normally non-zero and it would 

appear on the lines of (the unused Iflies that we wish to be zero). The 

5 zeroing of these hnes is done through the multiplexer MX and the control 

signal i;£ mentioned above, and shown in Pig. 3, 

B, Complexity ' ■ - 

The ct- array consists of m-1 a- cells where each cell contains m-1 XOR 
gates, m-1 switches and m-l multiplexers. Since switches and 
10 multiplexers are much simpler than XOR gates we approximate the 
complexity of a sAvitch-multiplexer pair by that of one XOR gate. Then the 
complexity of the o:-array C£Ln be estimated to 2(m-l) gates: The IP-network 
consists of m IP-cells where each cell contains 2m -1 gates. Totally 2m -m 
gates for the IP-network. Finally we need 3m register to store the vectors P, 

1 5 S and V needed to program the UGM (these registers are loaded from an 

external imit). The complexity Nugm whole UGM can therefore be 

estimated by : " , 

Nugm * 2(m-l)^ + 2m^-m + 3m, 

Compared to a prior art UGM with complexity - 7m +3m the present , ' 

2 0 UGM requires about 50% less components. 

C. Performance 

The performance of the UGM is directly related to the worst signal path 
(WSP) between any input and any output of the UGM. We will give an upper 
bound on the length L^gp (in gates) of the WSP. In doing this we 

2 5 approximate the delay of a switch-multiplexer pair by that of one XOR gate. 

The WSP through the UGM must go through m-1 a-cells and one EP-cell. 
The length of the WSP through the IP-cell is fixed and it is easily found to 
be 1+ riog2Afl gates. 

The WSP through the ct-array depends on the choice of P(x). It consists 

3 0 however of three parts: switches, XOR gates and multiplexers. The number 

of XOR gates along the WSP can be much less than m-1 by smart choice of 
P{x\ The following is a table over the number of XOR gates along the WSP 
through the a-array for some good P(x) and m < 16: 
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m 


PCx) 


# of XOR gates 


2 


2J.,0 


1 


3 


3^,0 


1 


4 




1 


5 


5 AO 


2 


6 


6A,0- 


1 


7 


7^,0 


1 


8 


8,5,3,2,0 


4 


9 


9,4,0 


2 


10 


10,3,0 




u 


llAO 


2 


12 


12,8,5,1,0 


■ 4 


13 


13,7,6,1,0 


4 


14 


14,9,7,2,0 


4 


15 


i5a,o 


.1 


16 


16A1,6,5,0 


5 



In the table we indicate only the powers of x in whose coefficients are 

non-zero . We see that the number of XOR gates is at least one and at most 
-2" for m :< 8. For m > 8 a better upper boimd seems to be -^^^ 
5 upper bound for all m. 

The number of switches and multiplexers along the WSP is not easily 
determined exactly. We assiune worst case and say therefore that the WSP 
goes through m-l switches and m-1 multiplexers. According to the 
approximation above tins corresponds to about m-1 XOR gates, 
10 The total length L^gp of the WSP can now be upper bounded by 

Lvvsp ^ ('^■^D + m/2 + 1 + TlogaMl = 1.6m + flogaMl [Gates] 
which is considerably better Uian the ~ 6m gates of a prior art UGM. 

D. Comments 

One skilled in the art wiU hnmediately recognize that several changes 
15 could be made in the above design without departing form the basic 
^ structure. For example, instead of storing the three vectors P, S and V in 
registers one could design some simple logic that generates both S and V 
from P (in this case also the highest coefficient of P(x) must he entered 
into the UGM), The programming of the UGM would thus be simplified to 
2 0 one single operation instead of three. The UGM is also easily modified to 
perform the operation A-B + £> by adding one input and one XOR gate to 
each rP-ceU, Further, the design of the sub-cell 111 can alternatively be 
done by using an AND gate instead of the multiplexer MX The AND gate 
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computes the product a^^^p^. This product enters then the XOR gate 
(instead of the feedback signal a^.i) to produce the sum a^^^^p^ + a^j^. 

The same general structure of Fig. 1 can be utilized for UGMs operating 
over fields of characteristic other than^o. Only the details get sUghtly 
5 more complicated since all coefficient operations must be performed 
modulo the prime p^p > 2, that is the XOR gate becomes a mod /?-adder and 
the AND gate a mod p-multiplier. Finther, for prime p > 2 we have -1^1 
mod p which means that signs must be considered. For example, suppose 
P(x) is a monic (i.e. with the highest coefficient pj^ - 1) irreducible 

1 0 polynomial of degree M over GF(p) that has a as a root, i.e. P(a) = 0. Then 

o^ = -oi^'Wi-.-.-aPi-Po = ^'VM-i+ (9) 
where p/is the additive inverse of p^ in GF(p). Now equation (5) becomes 

15 = P^l^jy ^ ^' ^PiP'M-l + Pix). (10) 

. . • « . ■■4. 

The design of the a-cell follows directly from equation (10). The a-cell 
consists of M'l identical sub-cells where each sub-cell performs the - 
operation pIcl^^^ + ^i-i pl^s one cell for computing PoCij^f^it where 
juxtaposition means modulo jo-mtdtiplication and modulo j7 -addition. 

2 0 Since P is known in advance the additive inverses p/, i = 0, 1, 2, m-l can 

be precomputed and input to the multiplier instead of the original 
coefficients p^. The a-cell is made programmable for operation over 
different fields GFCp'^X 2<m < M just the same way as for p = 2 by means of 
switches and the control vector S. The new a-array is obtained simply by 

2 5 cascading M-l a-cells just as before. The ct-array is connected to the IP 

network as before to compute the necessary inner products. The IP cell is 
modified to compute the inner product of two />-ary vectors of length M. The 
control vector V is used as for p = 2. The vectors & and V can either be stored 
in registers which are loaded from outside or they can be derived from the 

3 0 coefficients of P (in fact only the position of the highest coefficient p^ is 

relevant to this purpose) by some simple logic. We notice finally that the 
binary representation of each coefficient will reqtdre floggpl bits. For 
example the three elements of GF(3) require two bits. 
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Accordingly, it is intended that all matter contained in the above 
descriptions and the following drawings shall be interpreted as illustrative 
and not in a limiting sense. 
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CLAIMS 

5 1. A mxUtiplier for performing multiplication of two elements in the 
finite field GF(p'^) with elements, and obtaining a product vector of m p- 
ary components, where m is an integer equal to or greater than 2 or equal 
to or less than M, where M is an integer equal to or greater than 2, each of 
said p^ elements of GFCp'^) represented by a vector of m p-ary coefficients 
1 0 according to a polynomial basis representation, characterized by 

a) first logic means (1) including a cascade of at least one a-cell (11) 
for developing for the first of said two elements the first m a- 
multiples, each a-multiple being the product of a* and said element 
for i — 0, 1, 2, 7n-\^ where a is an element of the field GF(p'") 
satisfying the equation P(x) = 0 for x = a, where P(x) is a polynomial of 
degree m which is irreducible over the field GF(p); and 

b) second logic means (2) including at least two IP cells (21), where 
each IP cell will simultaneously develop the inner product of the 
second element and every p-ary vector whose components are the j*:th 
components of all said a-multiples for j = 0, 1, 2, /n-1, each of said 
m inner products being one component of said product vector, 

2. The multiplier recited in claim 1 w h e r e i n: 

a) said first logic means (1) comprise means for changing of said 
irreducible polynomial, whereby said first logic means are 

2 5 programmable for operation over any of said finite fields GF(p'^ ), 2 < 

m < M , including all possible representations of said finite fields; and 

b) means for selectively connecting the output of said second logic 
means (2) to a logical zero, 

3; The midtipHer recited in claim 1 or2wherein each of the p"* 

3 0 elements of GF(p"*) is represented by a vector of m p-ary components 

according to a poljmomial basis representation of the form A = + a^a + ... 
+ a^_<^o^'^ + ^m-i^'^ ^» where A is an element of GF(p'^ ), ao, a^, a^.g, 
a^,j are the p-ary components of A, smd a is an element of GF(p'^) 



1 5 



20 
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satisfying the equation P(x) = 0 for x = a, where Fix) is a polynomial of 
degree m which is irreducible over the field GF(p). 

4. ThemtQtipUerredtedinclaim2oiv3 wherein: 

a) the unused inputs of said first logic means (1) are set to logical 
5 zero; and 

b) the unxised inputs and outputs of said second logic means (2) are 
set to logical zero. 

5. The multipHer recited in dahn 1,2, 3or4whereinj^ = 2. 
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FIG. 1 
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